An LSH Index for Computing Kendall's Tau over Top-k Lists

نویسندگان

  • Koninika Pal
  • Sebastian Michel
چکیده

We consider the problem of similarity search within a set of top-k lists under the Kendall’s Tau distance function. This distance describes how related two rankings are in terms of concordantly and discordantly ordered items. As top-k lists are usually very short compared to the global domain of possible items to be ranked, creating an inverted index to look up overlapping lists is possible but does not capture tight enough the similarity measure. In this work, we investigate locality sensitive hashing schemes for the Kendall’s Tau distance and evaluate the proposed methods using two real-world datasets.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

RankReduce - Processing K-Nearest Neighbor Queries on Top of MapReduce

We consider the problem of processing K-Nearest Neighbor (KNN) queries over large datasets where the index is jointly maintained by a set of machines in a computing cluster. The proposed RankReduce approach uses locality sensitive hashing (LSH) together with a MapReduce implementation, which by design is a perfect match as the hashing principle of LSH can be smoothly integrated in the mapping p...

متن کامل

A New Weighted Rank Correlation

Problem Statement: There have been many cases in real life where two independent sources have ranked n objects, with the interest focused on agreement in the top rankings. Spearman's rho and Kendall's tau coefficients assigned equal weights to all rankings. As a result, the literature proposed several weighted correlation coefficients with emphasis on the top rankings, including the top-down, w...

متن کامل

Estimation of Kendall's tau from censored data

This paper considers the nonparametric estimation of Kendall's tau for bivariate censored data. Under censoring, there have been some papers discussing the nonparametric estimation of Kendall's tau, such as Wang and Wells (2000), Oakes (2008) and Lakhal, Rivest and Beaudoin (2009). In this article, we consider an alternative approach to estimate Kendall's tau. The main idea is to replace a cens...

متن کامل

A Clustered Index Approach to Distributed XPath

Supporting top-k queries over distributed collections of schemaless XML data poses two challenges. While XML supports expressive query languages such as XPath and XQuery, these languages require schema knowledge so as to write an appropriate query which may not be available in distributed systems with autonomous and dynamic sources. Thus, there is a need for approximate query processing. Furthe...

متن کامل

Abstract structure of partial function $*$-algebras over semi-direct product of locally compact groups

This article presents a unified approach to the abstract notions of partial convolution and involution in $L^p$-function spaces over semi-direct product of locally compact groups. Let $H$ and $K$ be locally compact groups and $tau:Hto Aut(K)$ be a continuous homomorphism.  Let $G_tau=Hltimes_tau K$ be the semi-direct product of $H$ and $K$ with respect to $tau$. We define left and right $tau$-c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1409.0651  شماره 

صفحات  -

تاریخ انتشار 2014